Mathematics Of Machine Learning
Mathematics Of Machine Learning
Paper 1, Section II, J
Part II, 2020 comment(a) Let be i.i.d. random elements taking values in a set and let be a class of functions . Define the Rademacher complexity . Write down an inequality relating the Rademacher complexity and
State the bounded differences inequality.
(b) Now given i.i.d. input-output pairs consider performing empirical risk minimisation with misclassification loss over the class of classifiers . Denote by the empirical risk minimiser [which you may assume exists]. For any classifier , let be its misclassification risk and suppose this is minimised over by . Prove that with probability at least ,
for , where is a class of functions related to that you should specify.
(c) Let for . Define the empirical Rademacher complexity . Show that with probability at least ,
Paper 2, Section II, J
Part II, 2020 comment(a) Let be a family of functions . What does it mean for to be shattered by ? Define the shattering coefficient and the dimension of
Let
and set . Compute .
(b) State the Sauer-Shelah lemma.
(c) Let be families of functions with finite VC dimension . Now suppose is shattered by . Show that
Conclude that for ,
[You may use without proof the fact that if with and , then for .]
(d) Now let be the collection of subsets of of the form of a product of intervals , where exactly of the are of the form for and the remaining intervals are . Set . Show that when ,
Paper 4, Section II, J
Part II, 2020 commentSuppose we have input-output pairs . Consider the empirical risk minimisation problem with hypothesis class
where is a non-empty closed convex subset of , and logistic loss
for and .
(i) Show that the objective function of the optimisation problem is convex.
(ii) Let denote the projection of onto . Describe the procedure of stochastic gradient descent (SGD) for minimisation of above, giving explicit forms for any gradients used in the algorithm.
(iii) Suppose minimises over . Suppose and . Prove that the output of iterations of the SGD algorithm with some fixed step size (which you should specify), satisfies
(iv) Now suppose that the step size at iteration is for each . Show that, writing for the th iterate of SGD, we have
where
[You may use standard properties of convex functions and projections onto closed convex sets without proof provided you state them.]
Paper 1, Section II, J
Part II, 2021 commentLet be a family of functions with . Define the shattering coefficient and the dimension of .
Briefly explain why if and , then .
Prove that if is a vector space of functions with and we define
then .
Let be the set of all spheres in . Suppose . Show that
Hint: Consider the class of functions , where
Paper 2, Section II, J
Part II, 2021 comment(a) What is meant by the subdifferential of a convex function at ? Write down the subdifferential of the function given by , where .
Show that minimises if and only if .
What does it mean for a function to be strictly convex? Show that any minimiser of a strictly convex function must be unique.
(b) Suppose we have input-output pairs with . Consider the objective function
where and . Assume that . Fix and define
where for . Show that if , then
[You may use any results from the course without proof, other than those whose proof is asked for directly.]
Paper 4, Section II, J
Part II, 2021 commentLet be a dataset of input-output pairs lying in for . Describe the random-forest algorithm as applied to using decision trees to produce a fitted regression function . [You need not explain in detail the construction of decision trees, but should describe any modifications specific to the random-forest algorithm.]
Briefly explain why for each and , we have .
State the bounded-differences inequality.
Treating as deterministic, show that with probability at least ,
where .
Hint: Treat each as a random variable taking values in an appropriate space (of functions), and consider a function satisfying